For the National Walkability Index, we wanted to go beyond the EPA’s current Walkability Map and created the following interactive spatial visualization that demonstrates county-level walkability scores and the health outcomes prevalence. We utilized many geographic information systems (GIS) concepts and synthesized skills learned from both the Data Science I (BIOST P8105) and Public Health GIS (EHSC P8371) courses that we’re taking in the creation of this spatial visualization.
When your mouse hover onto a county, the textbox displays the county name, National Walkability Index, and health outcomes percentage. These health outcomes are:
library(rjson)
library(leaflet)
library(tidyverse)
library(plotly)
# Read in cleaned datafile
walkability = read_csv("./data/merge_final.csv")
# Get the county-level .json file from the web
url = 'https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json'
counties = rjson::fromJSON(file = url)
# Set up the geographic projection for the map
g = list(scope = 'usa',
projection = list(type = 'albers usa'))
# Get the df for mean of walkability index by county
index_by_county = walkability %>%
mutate(fips_county = str_c(statefp, countyfp)) %>%
group_by(fips_county) %>%
summarise(ind_mean = round(mean(nat_walk_ind), 1))
# Get the df for mean of health outcome prevalence by county
hover_test_df = walkability %>%
mutate(fips_county = str_c(statefp, countyfp)) %>%
group_by(fips_county, measure_id) %>%
summarise(county_hlth = round(mean(data_value, na.rm = TRUE), 1)) %>%
pivot_wider(names_from = "measure_id", values_from = "county_hlth") %>%
janitor::clean_names()
# Merge the two df by county-level FIPS
walkability_map = full_join(x = index_by_county, y = hover_test_df, by = "fips_county")
# Adding variable for hovertext
walkability_map$hover <- with(walkability_map,
paste('<br>',
"National Walkability Index (by County):", ind_mean, '<br>',
"% Health Outcomes (by County)", '<br>',
"High Cholesterol:", highchol, '<br>',
"Physical Inactivity:", lpa, '<br>',
"Mental Health:", mhlth, '<br>',
"Physical Health:", phlth, '<br>',
"High Blood Pressure:", bphigh))
# Choropleth with hovertext of health outcomes by county by plotly
choropleth_map = plot_ly()
choropleth_map = choropleth_map %>%
add_trace(type = "choropleth",
geojson = counties,
locations = walkability_map$fips_county,
z = walkability_map$ind_mean,
text = walkability_map$hover,
colorscale = "Viridis",
reversescale = T, #to get reverse color scheme
zmin = 1,
zmax = 16,
marker = list(line = list(width = 0)))
choropleth_map = choropleth_map %>%
layout(title = "National Walkability Index in the US")
choropleth_map = choropleth_map %>%
layout(geo = g)
choropleth_map
To further understand the National Walkability Index in a smaller spatial unit, we narrowed the exploration to New York City, where us five walkaholics reside.
The first step is to get a general understanding of the walkability index in the City by creating a decriptive choropleth map on the census tract level. We focused on the following counties:
library(sf)
library(tmap)
library(tidyverse)
library(leaflet)
library(shinyjs)
# Read-in shapefile
choropleth_NYC <- st_read("./data/cluster_analysis/map_with_everything.shp", quiet = TRUE)
# Clean up and filter
choropleth_NYC = choropleth_NYC %>%
janitor::clean_names() %>%
mutate(nat_walk_i = as.numeric(nat_walk_i),
nat_walk_i = round(nat_walk_i, digits = 1)) %>%
filter(stusps == "NY",
namelsadco %in% c("Bronx County", "Kings County", "New York County", "Queens County", "Richmond County")) %>%
rename(median_income = s1903_c03) %>%
mutate(median_income = as.numeric(median_income))
choropleth_NYC = choropleth_NYC[!is.na(choropleth_NYC$nat_walk_i),]
# Time to map
tmap_mode("view")
tm_basemap("CartoDB.Positron") +
tm_shape(choropleth_NYC) +
tm_polygons(col = "nat_walk_i",
style = "fisher",
n = 5,
title = "National Walkability Index",
palette = ("Greens"))
From this choropleth map, we can see that New York City is walkable overall. However, we want to further identify census tracts that are statistically significant hot spots, cold spots, and spatial outliers using clustering analysis.
As mentioned, we want to find NYC census tracts that are statistically significant hot spots, cold spots, and spatial outliers. To start, it is crucial to understand the terms.
library(rgeoda)
library(sf)
library(tmap)
library(tidyverse)
library(leaflet)
# Read-in shapefile
cluster_map <- st_read("./data/cluster_analysis/map_with_everything.shp", quiet = TRUE)
# Clean up and filter
cluster_map_clean = cluster_map %>%
janitor::clean_names() %>%
mutate(nat_walk_i = as.numeric(nat_walk_i),
nat_walk_i = round(nat_walk_i, digits = 1)) %>%
filter(stusps == "NY",
namelsadco %in% c("Bronx County", "Kings County", "New York County", "Queens County", "Richmond County")) %>%
rename(median_income = s1903_c03) %>%
mutate(median_income = as.numeric(median_income))
cluster_map_NYC = cluster_map_clean[!is.na(cluster_map_clean$nat_walk_i),]
# Queen 1st order weights matrix
queen_w <- queen_weights(cluster_map_NYC, order = 1)
# Local Moran's I
lisa <- local_moran(queen_w, cluster_map_NYC["nat_walk_i"],
permutations = 9999)
# Get cluster column and join it to shp
cluster_map_NYC$cluster <- as.factor(lisa$GetClusterIndicators())
str(cluster_map_NYC)
# Add labels to clusters
levels(cluster_map_NYC$cluster) <- lisa$GetLabels()
# We want the GeoDa colors
lisa_colors <- lisa_colors(lisa)
# Recode clusters using tidyverse (dyplr)
cluster_map_NYC %>%
mutate(cluster = recode_factor(.x = cluster,
`Undefined` = "Not significant",
`Isolated` = "Not significant")) -> cluster_map_NYC
# Map the clusters with a basemap
tmap_mode("view")
tm_basemap("CartoDB.Positron") +
tm_shape(cluster_map_NYC) +
tm_polygons(col = "cluster",
palette = c("#eeeeee",
"#FF0000",
"#0000FF",
"#a7adf9",
"#f4ada8"),
alpha = .8,
title = "Clusters",
border.col = "black",
border.alpha = .9)